282 research outputs found
Fast deterministic processor allocation
Interval allocation has been suggested as a possible formalization for the PRAM of the (vaguely defined) processor allocation problem, which is of fundamental importance in parallel computing. The interval allocation problem is, given nonnegative integers , to allocate nonoverlapping subarrays of sizes from within a base array of cells. We show that interval allocation problems of size can be solved in time with optimal speedup on a deterministic CRCW PRAM. In addition to a general solution to the processor allocation problem, this implies an improved deterministic algorithm for the problem of approximate summation. For both interval allocation and approximate summation, the fastest previous deterministic algorithms have running times of . We also describe an application to the problem of computing the connected components of an undirected graph
On a compaction theorem of ragde
Ragde demonstrated that in constant time a PRAM with processors can move at most items, stored in distinct cells of an array of size , to distinct cells in an array of size at most . We show that the exponent of 4 in the preceding sentence can be replaced by any constant greater than~2
Optimal parallel string algorithms: sorting, merching and computing the minimum
We study fundamental comparison problems on strings of characters, equipped with the usual lexicographical ordering. For each problem studied, we give a parallel algorithm that is optimal with respect to at least one criterion for which no optimal algorithm was previously known. Specifically, our main results are: % \begin{itemize} \item Two sorted sequences of strings, containing altogether ~characters, can be merged in time using operations on an EREW PRAM. This is optimal as regards both the running time and the number of operations. \item A sequence of strings, containing altogether ~characters represented by integers of size polynomial in~, can be sorted in time using operations on a CRCW PRAM. The running time is optimal for any polynomial number of processors. \item The minimum string in a sequence of strings containing altogether characters can be found using (expected) operations in constant expected time on a randomized CRCW PRAM, in time on a deterministic CRCW PRAM with a program depending on~, in time on a deterministic CRCW PRAM with a program not depending on~, in expected time on a randomized EREW PRAM, and in time on a deterministic EREW PRAM. The number of operations is optimal, and the running time is optimal for the randomized algorithms and, if the number of processors is limited to~, for the nonuniform deterministic CRCW PRAM algorithm as we
Improved parallel integer sorting without concurrent writing
We show that integers in the range 1 \twodots n can be stably sorted on an \linebreak EREW PRAM using \nolinebreak time \linebreak and operations, for arbitrary given \linebreak , and on a CREW PRAM using % time and time and operations, for arbitrary given . In addition, we are able to sort arbitrary integers on a randomized CREW PRAM % using % time and operations within the same resource bounds with high probability. In each case our algorithm is a factor of almost closer to optimality than all previous algorithms for the stated problem in the stated model, and our third result matches the operation count of the best known sequential algorithm. We also show that integers in the range 1 \twodots m can be sorted in time with operations on an EREW PRAM using a nonstandard word length of bits, thereby greatly improving the upper bound on the word length necessary to sort integers with a linear time-processor product, even sequentially. Our algorithms were inspired by, and in one case directly use, the fusion trees of Fredman and Willard
Fast integer merging on the EREW PRAM
We investigate the complexity of merging sequences of small integers on the EREW PRAM. Our most surprising result is that two sorted sequences of bits each can be merged in time. More generally, we describe an algorithm to merge two sorted sequences of integers drawn from the set in time using an optimal number of processors. No sublogarithmic merging algorithm for this model of computation was previously known. The algorithm not only produces the merged sequence, but also computes the rank of each input element in the merged sequence. On the other hand, we show a lower bound of on the time needed to merge two sorted sequences of length each with elements in the set , implying that our merging algorithm is as fast as possible for . If we impose an additional stability condition requiring the ranks of each input sequence to form an increasing sequence, then the time complexity of the problem becomes , even for . Stable merging is thus harder than nonstable merging
Succinct Indexable Dictionaries with Applications to Encoding -ary Trees, Prefix Sums and Multisets
We consider the {\it indexable dictionary} problem, which consists of storing
a set for some integer , while supporting the
operations of \Rank(x), which returns the number of elements in that are
less than if , and -1 otherwise; and \Select(i) which returns
the -th smallest element in . We give a data structure that supports both
operations in O(1) time on the RAM model and requires bits to store a set of size , where {\cal B}(n,m) = \ceil{\lg
{m \choose n}} is the minimum number of bits required to store any -element
subset from a universe of size . Previous dictionaries taking this space
only supported (yes/no) membership queries in O(1) time. In the cell probe
model we can remove the additive term in the space bound,
answering a question raised by Fich and Miltersen, and Pagh.
We present extensions and applications of our indexable dictionary data
structure, including:
An information-theoretically optimal representation of a -ary cardinal
tree that supports standard operations in constant time,
A representation of a multiset of size from in bits that supports (appropriate generalizations of) \Rank
and \Select operations in constant time, and
A representation of a sequence of non-negative integers summing up to
in bits that supports prefix sum queries in constant
time.Comment: Final version of SODA 2002 paper; supersedes Leicester Tech report
2002/1
Fast Breadth-First Search in Still Less Space
It is shown that a breadth-first search in a directed or undirected graph
with vertices and edges can be carried out in time with
bits of working memory
Matching Subsequences in Trees
Given two rooted, labeled trees and the tree path subsequence problem
is to determine which paths in are subsequences of which paths in . Here
a path begins at the root and ends at a leaf. In this paper we propose this
problem as a useful query primitive for XML data, and provide new algorithms
improving the previously best known time and space bounds.Comment: Minor correction of typos, et
A Lower-Bound for the Emulation of PRAM Memories on Processor Networks
AbstractWe show a lower bound of Ω(min{log m, √n}) on the slowdown of any deterministic emulation of a PRAM memory with m cells and n I/O ports on an n-processor bounded-degree network. The bound is weak; unlike all previous bounds, however, it does not depend on the unnatural assumption of point-to-point communication which says, roughly, that messages in transit cannot be duplicated by intermediate processors. For m sufficiently large relative to n, the new bound implies the optimality of a simple emulation on a mesh-of-trees network
Succinct Partial Sums and Fenwick Trees
We consider the well-studied partial sums problem in succint space where one
is to maintain an array of n k-bit integers subject to updates such that
partial sums queries can be efficiently answered. We present two succint
versions of the Fenwick Tree - which is known for its simplicity and
practicality. Our results hold in the encoding model where one is allowed to
reuse the space from the input data. Our main result is the first that only
requires nk + o(n) bits of space while still supporting sum/update in O(log_b
n) / O(b log_b n) time where 2 <= b <= log^O(1) n. The second result shows how
optimal time for sum/update can be achieved while only slightly increasing the
space usage to nk + o(nk) bits. Beyond Fenwick Trees, the results are primarily
based on bit-packing and sampling - making them very practical - and they also
allow for simple optimal parallelization
- …